Histograms Without Storing Observations
نویسندگان
چکیده
A heuristic algorithm is proposed for dynamic calculation qf the median and other quantiles. The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations. This makes it ideal for implementing in a quantile chip that can be used in industrial controllers and recorders. The algorithm is further extended to histogram plotting. The accuracy of the al,gorithm is analyzed. 1. INTRODIJCTION In the field of simulation modeling, there is a trend toward repo:rting medians or o.%quantile:s rather than mean and st.andard deviation alone. (The p-quantile of a distributi0.n is defined as the value below which 100~ percent of th,e distribution lies.) However. unlike the mean and st.andard deviation, calculation of quantiles requires several passes through the data, and therefore, the observations have to be stored. Further, a large number of o’bservations is required to get a good esti
منابع مشابه
Stream Quantiles via Maximal Entropy Histograms
We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We (i) highlight the limitations of approaches previously described in the literature which make them unsuitable for non-stationary streams, (ii) describe a novel principle for the utilization of the available storage space, and (iii) introduce two novel algorithms whi...
متن کاملSimilarity of Color Images
We describe two new color indexing techniques. The rst one is a more robust version of the commonly used color histogram indexing. In the index we store the cumulative color histograms. The L 1-, L 2-, or L 1-distance between two cumulative color histograms can be used to deene a similarity measure of these two color distributions. We show that while this method produces only slightly better re...
متن کاملBuilding Wavelet Histograms on Large Data in MapReduce
MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential. Among various data summarization tools, histograms have proven to be particularly important and useful for summarizing data, and the wavelet histogram is one...
متن کاملTensor Decompositions for Integral Histogram Compression and Look-Up
Histograms are a fundamental tool for multidimensional data analysis and processing, and many applications in graphics and visualization rely on computing histograms over large regions of interest (ROI). Integral histograms (IH) greatly accelerate the calculation in the case of rectangular regions, but come at a large extra storage cost. Based on the tensor train decomposition model, we propose...
متن کاملA successively refinable wavelet-based representation for content-based image retrieval
Content based retrieval of image and video data from databases is a very challenging problem, whose interest is derived from the need of future databases to support eecient access to vast amounts of visual information. Typical queries to be performed in this context check attributes of objects present in image data, such as shape, color, relative locations, etc. Therefore, the way in which imag...
متن کامل